Privacy-Awareness of Distributed Data Clustering Algorithms Revisited
نویسندگان
چکیده
Several privacy measures have been proposed in the privacypreserving data mining literature. However, privacy measures either assume centralized data source or that no insider is going to try to infer some information. This paper presents distributed privacy measures that take into account collusion attacks and point level breaches for distributed data clustering. An analysis of representative distributed data clustering algorithms show that collusion is an important source of privacy issues and that the analyzed algorithms exhibit different vulnerabilities to collusion groups.
منابع مشابه
Entropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملRepeated Record Ordering for Constrained Size Clustering
One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...
متن کاملA Survey on Location Based Services in Data Mining
Data privacy has been the primary concern since the distributed database came into the picture. More than two parties have to compile their data for data mining process without revealing to the other parties. Continuous advancement in mobile networks and positioning technologies have created a strong challenge for location-based applications. Challenges resembling location-aware emergency respo...
متن کاملComparison of distributed evolutionary k-means clustering algorithms
Dealing with distributed data is one of the challenges for clustering, as most clustering techniques require the data to be centralized. One of them, k-means, has been elected as one of the most influential data mining algorithms for being simple, scalable, and easily modifiable to a variety of contexts and application domains. However, exact distributed versions of k-means are still sensitive ...
متن کاملAn Efficient Distributed Data Clustering Algorithm
The k-means algorithm is one of the most popular clustering algorithms in use today. The high running time complexity of serial k-means limits its applicability for very large databases. On the other hand, the existing parallel kmeans algorithms demand huge data transfer operations incorporating high communication complexity. Transfer of actual data from local sites is also unacceptable, in man...
متن کامل